Overlapping MS/MS spectra and disease proteomics
نویسنده
چکیده
The ongoing success of the proteomics endeavor is the result of a prolific symbiosis between experimental ingenuity [2, 3, 4] and efficient bioinformatics [5, 6, 7, 8, 9, 10, 11]. Without these, ground-breaking landmarks such as the human genome project [12, 13] or the HUPO initiative [14] would likely not have seen the light of day. But despite valuable contributions, the road to a better understanding of disease proteomics is still hurdled by significant difficulties in the extensive identification of post-translational modifications and in the sequencing of novel proteins like cancer fusion proteins or antibody chains. Recently, tandem mass spectrometry (MS/MS) based approaches seemed to be reaching the limit on the amount of information that could be extracted from MS/MS spectra [15, 16, 17]. However, a closer look reveals that a common limiting procedure is to analyze each spectrum in isolation, even though high throughput mass spectrometry regularly generates many spectra from related peptides. By capitalizing on this redundancy we have shown that, similarly to the alignment of protein sequences [5], unidentified MS/MS spectra can also be aligned for the identification of modified and unmodified variants of the same peptide. Moreover, this alignment procedure can be iterated for the accurate grouping of multiple peptide variants (Figure 1). The highly correlated peaks in spectra from variants of the same peptide allowed us to reliably identify all known and even some unknown modifications in a sample of cataractous lenses proteins [18, 19]. Furthermore, the combination of shotgun proteomics [20] with the alignment of spectra from overlapping peptides led us to the development of Shotgun Protein Sequencing [21] similarly to the assembly of DNA reads into whole genomic sequences, we have shown that assembly of MS/MS spectra enables the highest ever de-novo sequencing accuracy, while recovering over 85% of the target proteins sequence [22](Figure 2). Similar mixtures of venom proteins have previously provided essential clues for the design of important drugs [23, 24]. Beyond providing the proof-of-concept for these methods, we are actively collaborating on quantifying drug and age-induced changes in post-translational modifications, and on sequencing of cancer fusion proteins, antibody light/heavy chains and unknown snake venom proteins. Additionally, our tools will be available to the community as open-source packages and web services .
منابع مشابه
Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins.
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpreta...
متن کاملShotgun Protein Sequencing ASSEMBLY OF PEPTIDE TANDEM MASS SPECTRA FROM MIXTURES OF MODIFIED PROTEINS*□S
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpreta...
متن کاملShotgun Protein Sequencing: Assembly of Tandem Mass Spectra from Mixtures of Modified Proteins
Despite significant advances in the identification of known proteins, the analysis of unknown proteins by tandem mass spectrometry (MS/MS) still remains a challenging open problem. Although Klaus Biemann recognized the potential of tandem mass spectrometry (MS/MS) for sequencing of unknown proteins in the 1980s, low-throughput Edman degradation followed by cloning still remains the main method ...
متن کاملModification-tolerant Shotgun Protein Sequencing of a Snake Venom Proteome
Despite the steady accumulation of fully sequenced genomes for model organisms, limited or no sequence information is available for most organisms. Moreover, natural mechanisms of variation such as accelerated mutation and combinatorial recombination in immunoglobulins regularly create novel sequences in the proteomes of model organisms. However, since protein identification via database search...
متن کاملProtein identification by spectral networks analysis.
While advances in tandem mass spectrometry (MS/MS) steadily increase the rate of generation of MS/MS spectra, standard algorithmic approaches for peptide identification recently seemed to be reaching the limit on the amount of information that could be extracted from MS/MS spectra. However, a closer look reveals that a common limiting procedure is to analyze each spectrum in isolation, even tho...
متن کامل